Online heuristic planning for highly uncertain domains
نویسندگان
چکیده
Heuristic search algorithms for online POMDP planning have shown great promise in creating successful policies for maximizing agent rewards using heuristics typically focused on reducing the error bound in the agent’s cumulative future reward estimations. However, error bound-based heuristics are less informative in highly uncertain domains requiring long sequences of information gathering, such as robotics. In these domains, all possible plan improvements look similar under error bound-based heuristics until the agent’s belief uncertainty has been resolved, leaving the agent initially confused on how best to improve its plan under the realtime constraints of online planning. We propose (1) a novel heuristic guiding the agent towards policies that first reduce the agent’s belief uncertainty, after which error bound-based heuristics are more effective, and (2) a novel selection mechanism for choosing which type of heuristic (error bound or uncertainty-based) to use during the current stage of planning to most quickly form a good plan. We evaluate our solution in several benchmark POMDP problems, demonstrating that our solution yields successful policies with less planning time in highly uncertain domains and comparable performance in simpler problems.
منابع مشابه
Domain and Plan Representation for Task and Motion Planning in Uncertain Domains
As robots become more physically robust and capable of sophisticated sensing, navigation, and manipulation, we want them to carry out increasingly complex tasks. A robot that helps in a household must plan over the scale of hours or days, considering abstract features such as the desires of the occupants of the house, as well as detailed models that support locating and getting objects, whether...
متن کاملAn anytime approach for on-line planning
In this paper we present a novel planning approach, based on well-known techniques such as goal decomposition and heuristic planning, aimed at working in highly dynamic environments with time constraints. Our contribution is a domainindependent planner to incrementally generate plans under a deliberative framework for reactive domains. The planner follows the anytime principles, i.e a first sol...
متن کاملFHHOP: A Factored Hybrid Heuristic Online Planning Algorithm for Large POMDPs
Planning in partially observable Markov decision processes (POMDPs) remains a challenging topic in the artificial intelligence community, in spite of recent impressive progress in approximation techniques. Previous research has indicated that online planning approaches are promising in handling large-scale POMDP domains efficiently as they make decisions “on demand” instead of proactively for t...
متن کاملConditional Planning under Partial Observability as Heuristic-Symbolic Search in Belief Space
Planning under partial observability in nondeterministic domains is a very significant and challenging problem, which requires dealing with uncertainty together with and-or search. In this paper, we propose a new algorithm for tackling this problem, able to generate conditional plans that are guaranteed to achieve the goal despite of the uncertainty in the initial condition and the uncertain ef...
متن کاملOnline planning for large MDPs with MAXQ decomposition
Markov decision processes (MDPs) provide an expressive framework for planning in stochastic domains. However, exactly solving a large MDP is often intractable due to the curse of dimensionality. Online algorithms help overcome the high computational complexity by avoiding computing a policy for each possible state. Hierarchical decomposition is another promising way to help scale MDP algorithms...
متن کامل